TEXT TO SPEECH SYNTHESIZER 
The present invention relates to a text to speech synthesizer capable of reading out 
text aloud for exchanging information such as e-mails and networked news articles as 
synthesized speech. 

BACKGROUND OF THE INVENTION 
With the rapid expansion in the number of people using the internet that has come 
about in recent years, portable information terminals such as personal computers, 
portable telephones, PDA's and pagers, etc., have rapidly become widespread as ways 
of connecting to the internet both in business, at home, and in schools, etc. One reason 
for this is the existence of message exchange systems such as e-mail and internet news 
systems, etc. In recent years, new kinds of message exchange systems that integrate 
various message systems such as systems that convert messages (such as e-mail) into 
speech for transfer to a telephone, systems that convert messages into speech at a 
terminal which is then read out, systems where notification of the arrival of an e-mail is 
outputted to a pager in the possession of the user of the destination, and systems where 
image information from a fax machine is transmitted as multimedia e-mail with 
information terminals have recently started to appear. These services centering on 
messages such as e-mail and speech synthesis have brought about a further increase in 
users. 

An essential function of such message exchange systems is to be able to read out 
e-mail and networked news on a telephone. However, such e-mail and networked news 
is completed with the intention that a recipient may read this information with the naked 
eye, and cases where information is included that cannot be converted to speech are 
common. For example, characters indicating a facial expression (also referred to as 



pictographs, ascii art and glyphs) can be used in order to convey subtle feelings and 
facial nuances of the writer in e-mails or networked news. 

For example, FIG. 20(b) is a view showing an example of a face inputted as a 
facial expression. Numeral 291 in FIG. 20(b) is an example of a typical e-mail face 
inputted using simple facial characters. In FIG. 20(b), numeral 292 represents a facial 
character made using parenthesis "(" and ")", and the symbols " " " and V and meaning 
"smile", and numeral 293 is a facial character made from parenthesis "(" an d ")" m & * e 
symbols "J 1 , '\ " and "." and meaning "sorry!". 

When this kind of character string is read out in related text to speech converter 
systems, the characters are read out one at a time, which means that the feelings of the 
sender are not conveyed to the recipient. 

Related technology for enabling text to speech conversion of facial characters is 
cited in published unexamined Japanese Patent Application No. Hei. 11-305987. In this 
reference, "facial expressions" are represented as being "pictographs". The following is 
a description of technology disclosed in this reference. 

FIG. 20 is a view describing related technology disclosed in this document, with 
FIG. 20(a) showing the overall configuration of a text to speech synthesizer 281. The 
text to speech synthesizer 281 comprises a text input device 282 for receiving text input 
from outside of the apparatus, a facial character extraction device 283 for searching 
facial characters from within the input text 287, a facial character reading converter 284 
for converting facial characters retrieved in accordance with a facial character reading 
table 285 into readings, and a speech synthesizer for converting the input text 287 
converted by the facial character reading converter 284 into synthesized speech. 

Table 1 is a view of the facial character reading table 285. 



Table 1 



Facial characters 


Reading 


r. ") 


"smile" 


(_o _) 


"sorry!" 



The facial character reading table 285 is in a format where the "facial character" 
and the reading when synthesized as speech are held as a single group. 

FIG. 20(b) shows the text 294 after carrying out conversion of the inputted text 
291 and the reading of the facial character 

In the following, a description is given of the operation of the text to speech 
converter of the related art. When text data is inputted to the text input device 282, the 
facial character extraction device 283 searches for facial characters by referring to facial 
character data recorded in the facial character reading table 285. In the example in FIG. 
20(b), two facial characters, 292 and 293, are retrieved. Next, the facial character 
reading converter 284 converts locations of the facial characters into readings in 
accordance with the facial character reading table 285 (refer to table 1) for output as text 
294. Finally, the speech synthesizer 286 converts the converted text data 294 into 
synthesized speech. As a result of the above processing, facial character portions that 
cannot conventionally be put into the form of speech or are put into speech in the form 
of symbol names one character at a time can be read out as synthesized speech. 

In the related art disclosed in the reference described above, facial character 
portions can be converted to readings that can be synthesized as speech by providing a 
table for registering the facial characters and a device for retrieving, extracting and then 
converting text data from the facial characters. 

However, the following problems exist with the related art. 



(1) Registration of facial characters puts pressure on resources. Namely, if facial 
characters to be read out are to be additionally registered, both the table size (amount of 
memory used) and the load on the search processing increase. 

If this is to be added as a listing, this will increase table size (amount of memory 
used) and increase the load placed on the search processing. This is also linked to 
increases in production costs in environments where resources are limited such as in 
portable information terminals. 

(2) Facial characters are also created independently by users and their types 
therefore also continue to increase. According to the related art, there are no means for 
reading out facial characters other than those recorded in the facial character table in 
order to provide compatibility with each time the facial characters continue to increase. 
However, there is also a limit on the number of facial characters that can be recorded 
due to limits with regards to resources. 

SUMMARY OF THE INVENTION 

It is the object of the present invention to provide a text to speech synthesizer 
capable of reading out as yet unknown facial characters in an environment of limited 
resources while keeping increases in memory size to a minimum. 

In order to achieve this, a text to speech synthesizer of the present invention 
comprises a text analyzer for analyzing Japanese text data, a facial character reading 
assignment unit for assigning facial character readings to character string portions of 
text analysis results determined to correspond to facial characters, and a speech 
synthesizer for outputting synthesized speech based on the analysis results of the text 
analyzer. The facial character reading assignment unit is constituted by a facial 
character determining unit for determining whether or not a symbol is a symbol 



constituting a facial character using an outline symbol table, a characteristic extraction 
unit for extracting characteristic symbols used in facial characters from character strings 
determined to be facial characters, and a reading selection unit for outputting readings 
allotted to the extracted reading numbers and facial character position data. Here, 
readings are assigned to the facial character strings according to the number of times 
characteristic symbols appear in facial characters. 

BRIEF DESCRIPTION OF THE DRAWINGS 
FIG. 1 is a view of an overall configuration for a text to speech synthesizer. 
FIG. 2 is a structural view of a facial character reading assignment unit of the first 
embodiment. 

FIG. 3 shows a flowchart of the process of a facial character determining unit. 
FIG. 4 shows a flowchart of the process of a characteristic extraction unit. 
FIG. 5 shows an example of text data to be passed to the reading assignment unit. 
FIG. 6 shows an example of output of the facial character determining unit. 
FIG. 7 is a structural view of a facial character reading assignment unit of the 
second embodiment. 

FIG. 8 is a view of a configuration for a characteristic extraction unit. 
FIG. 9 is a conceptual view of a vector table. 

FIG. 10 shows an example of facial character determination processing results. 
FIG. 11 shows an example of a frequency vector. 
FIG. 12 shows an example of a selected typical vector. 

FIG. 13 is a structural view of a facial character reading assignment unit of the 
third embodiment. 

FIG. 14 is a view of a configuration for a characteristic extraction unit. 



FIG. 1 5 shows an example of a vector table. 
FIG. 16 shows an example of facial character determination results. 
FIG. 17 shows an example of a frequency vector. 
FIG. 18 shows an example of a frequency vector after dim processing. 
5 FIG. 19 shows an example of a selected typical vector. 

FIG. 20 is a view describing the related art. 



DETAILED DESCRIPTION OF THE INVENTION 
The following is a description with reference to the drawings of an embodiment of 
%3 10 a text to speech synthesizer of this invention. Each drawing is merely shown in a 
£n simplified manner to such an extent that the invention may be clearly understood. 

"*S First Embodiment 

m 

FIG. 1 is a view showing an overall configuration of a text to speech synthesizer 
W of the present invention. The speech synthesizer comprises a text analyzer 11 for 

15 performing analysis of Japanese on text data 14, an speech synthesizer 13 for outputting 
results outputted by the text analyzer and outputting synthesized speech 15, and a facial 
character reading assignment unit 12 provided at the text analyzer 1 1, for receiving text 
data determined to not yet be in the dictionary, determining whether or not facial 
characters are present, and assigning readings to the facial characters and detecting 
20 facial character position when facial characters are present. 

As shown in FIG. 2, the facial character reading assigning unit comprises a text 
buffer 31 for receiving text data 24 and housing this text data 24, a facial character 
determining unit 21 for determining whether or not the housed data fulfills facial 
character conditions using an outline symbol table 25, extracting outline position data 
25 26, and outputting this position, a characteristic extraction unit 22 for extracting 
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symbols used in facial characters from inputted text data and outputting correspondingly 
assigned reading numbers 28 and outline position data, and a reading selector 23 for 
receiving the reading numbers and outline position data, and acquiring and outputting 
readings 30 allotted to the numbers from a reading table 29 and facial character position 
(that is start and end outline position in text data). 

Table 2 shows an example of an outline symbol table, with right outline symbols 
and left outline symbols respectively being registered. 



Table 2 



Left outline symbol 


Right outline symbol 


( 


) 


{ 


} 


[ 


] 



Table 3 shows an example of a characteristic symbol table. Symbols that are most 
commonly used in locations corresponding to eyes for ten types of facial characters are 
listed in the left side of the symbol table. Unique numbers (reading numbers) 
corresponding to readings for cases where these symbols are used for both eyes are 
listed on the right side of the table. For example, when the symbol " ~ " is used for both 
eyes, then this indicates a facial character such as "smile" or "smiley face", to which the 
reading number 1 is allotted. This means that table size can be suppressed to a greater 
extent than in the related art as a result of not storing a set of facial character patterns 
but instead listing just characteristic symbols and separating reading character strings 
from the characteristic symbol table in a separate table referred to as a reading table. 



Table 3 



Symbol 


Reading number 


A 


1 




2 


. 


3 


T 


4 


X 


5 


+ 


5 




1 


n 


1 


* 


2 




4 



Only table offset values exist as reading number at the time of installation. For 
example, reading number 1 corresponds to the reading (smiling). 



Table 4 



Reading number 


Reading 


1 


smiling 


2 


whoops 


3 


Oh dear 


4 


Boo-hoo! 


5 


I give up 



The following is a description of the operation of a first embodiment. First, the 
overall operation of a text to speech synthesizer is described. The text analyzer 11 
performs morphogical analysis in order to output intermediate language (typically 
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consisting of katakana characters and some synthesis parameters) from the inputted text 
data. In this morphogical analysis, words are sectioned up using a Japanese dictionary 
and grammatical rules and word information such as readings and accents for words is 
assigned. It is necessary to assign readings because facial characters included in the text 
data are not listed in the dictionary. Text for facial character portions is therefore 
outputted to the facial character reading assignment unit 12. 

An example of this text data is shown in FIG. 5. Here, analysis of the portion 
"looking forward to this evenings party!" in FIG. 5 is complete. The portion indicated 
by numeral 81 indicates a location where words cannot be found. 

In the following, a description is given with reference to FIG. 2 of the operation of 
the facial character reading assignment unit of the first embodiment. First, processing of 
the facial character determining unit is described. When the text data 24 is sent from the 
text analyzer 11, the facial character determining unit 21 extracts outline symbols using 
the outline symbol table 25 (refer to table 2) and makes a determination as to whether or 
not facial characters are present. 

This determination is performed in the following manner. 

(determination condition 1) The presence of a character string sandwiched by 
pre-registered outline symbols. 

(determination condition 2) The number of characters between the outline symbols 
being K or less (where K=5). 

When the results of the determination are that facial characters are present, the 
position of the extracted outline symbols (start and end positions) and the text data 24 
are sent to the characteristic extraction unit 22. 

Specific processing performed by the facial character determining unit 21 is 
described with reference to the flowchart of FIG. 3. 



(Al) Starting from S in FIG. 3, with processing proceeding so as to finish at El or 

E2. 

(A2) A scanning pointer p is set to the left end of the inputted text (SI). 

(A3) A determination is made as to whether or not a scanning pointer p has 
reached the right end of the data (S2). 

(A4) If the determination results for (S2) are YES, processing proceeds to (A 16), 
and if NO, processing proceeds to (A5). 

(A5) A determination is made as to whether a character indicated by the scanning 
pointer p "is listed as a left outline symbol". If listed, it is taken that facial characters 
may be present and processing proceeds to (A6). If not listed, the scanning pointer p 
advances by one character portion, and (A3) is returned to (S3, S4). 

(A6) The counter number counter "cnt" is initialized to 0 (S5). 

(A7) The current position of the scanning pointer is stored in a left outline 
character buffer ps (S6). 

(A8) The scanning pointer p proceeds to character L (where, for example, L=2). 
This value L=2 is a value set assuming the case where the content inside the outline is 
two characters, because the value of L=2 is the minimum value for configuring facial 
characters (S7). (A9) The scanning pointer p advances by one character portion (S8). 

(A 10) The character number counter "cnt" has one added (S9). 

(All) A determination is made as to whether or not the scanning pointer p has 
reached the end of the text. 

If the end has been reached, the processing of (A 16) is proceeded to. If not, the 
processing of (A 12) is proceeded to (S10). 

(A 12) A determination is made as to whether or not the character number counter 
"cnt" is less than or equal to a threshold value K. When less than or equal to K, the 
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processing of (A13) is proceeded to, and when K is exceeded, (A16) is proceeded to. In 
this processing, facial character determination conditions are based on the assumption 
that facial characters constructed from a large number of characters are not allowed. The 
value of K in this case is experimentally taken to be K=5 (Sll). 

(A 13) A determination is made as to whether or not the character pointed to by 
the scanning pointer p is in the right outline symbol table. 

When this character is determined to be a right outline symbol, when progress to 
(A 14) appears unlikely, processing returns to (A9), and extraction of the outline 
symbols is repeated (S12). 

(A 14) The value of the current scanning pointer p is stored in the right outline 
symbol buffer pe (S13). 

(A 15) If El is reached, ps and pe extracted as outline position data (26) together 
with the text data (24) is sent to the characteristic extraction unit (22). 

(A 16) If E2 is reached, then the facial character conditions are not fulfilled, and 
results are sent to the text analyzer (11) without assigning a reading (S14). 

The characteristic extraction unit 22 takes outline position data (ps, pe) 26 
obtained by the facial character determining unit 21 as input, scans a range between the 
outline symbols for data stored in the text buffer 31, performs analysis using the 
characteristic symbol table 27 (refer to FIG. 3), and decides upon a reading number 28, 
and outputting the reading number and outline position data. 

Next, a description is given of a method for extracting symbols used as eyes using 
the characteristic symbol table 27. In the flow for the basic process, when scanning 
within the outline symbols in order from the left one character at a time, the number of 
times symbols listed in the characteristic symbol table appear is counted, symbols for 
which the number of appearances is two are determined to be eyes, and reading 
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numbers allotted to these symbols are sent to the reading selector 23. For example, with 
the facial characters (T__T), the symbol T is used twice and is therefore determined to 
appear as eyes. Further, the same symbol is not always used for both eyes, and the 
following case is therefore assumed. 

♦ When a plurality of eye symbols are used twice. 

• When both eye symbols are different. 

An example of the former case would be, for example, (* " O " *), as shown in 
FIG. 6. In this case, symbols that are positioned more towards the center of the 
appearing symbols are determined to be eyes. The reason for this is that structures of the 
patterns for these facial characters in order from the center towards the outline in the 
order of "nose or mouth", "eyes", "cheek", "outline" are common so that the maker can 
allow the recipient to recognize that these characters are facial characters. 

A case where both eye symbols are different is, for example, ( " o — ). In this case, 
it is necessary to select one of either of the symbols. However, from experience there is 
probably not a large difference. Therefore, in this embodiment, the symbol for an eye 
that appears first is determined to be an eye. 

A flowchart of the processing at the characteristic extraction unit is shown in FIG. 

4. 

(Bl) Starting from the position S, with processing proceeding so as to finish at E. 
(B2) The reading number N is initialized to 0 (S21). 
(B3) The scanning pointer p is set to ps (S22). 

(B4) A determination is made as to whether or not the scanning pointer p has 
reached pe. When this is so, scanning within the facial characters is assumed to have 
finished and (BIO) is proceeded to. When pe has not been reached, it is assumed that the 
search within the facial characters is still in progress and (B5) is proceeded to (S23). 
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(B5) A determination is made as to whether or not a character designated by the 
scanning pointer p is present in the characteristic symbol table 27 (refer to table 3). 
When a character is present, it is assumed that the characteristic symbols have been 
extracted and the process proceeds to (B7). When a character is not present in the 
characteristic symbol table, the process advances to (B6) (S24). 

(B6) The scanning pointer advances by one character, and (B4) is advanced to 
(S25). 

(B7) A determination is made as to whether or not the reading number N is still 
the initial value (=0). When YES, reading numbers corresponding to the extracted 
characteristic symbol is acquired from the reading table 29 (refer to table 4) and is 
stored in the reading number buffer N as the symbol appearing first. When NO, (B8) is 
proceeded to. 

(B8) The number of appearances corresponding to the extracted characteristic 
symbols is incremented by one (S28). 

(B9) when the number of appearances corresponding to the extracted 
characteristic symbols has reached two, the reading number corresponding to the 
extracted characteristic symbols is stored (S30) and (BIO) is proceeded to. When this is 
not the case, (B6) is returned to, and scanning of the inside of the facial characters is 
continued. 

(BIO) The value stored in the reading number buffer N is decided upon as the end 
for the characteristic extraction unit and sent to the reading selection unit 23. 

Table 5 is an example of a table for the number of appearances when the steps of 
the process during processing of the facial characters shown in FIG. 6 reaches E. 

Table 5 
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Eye symbols 


Number of appearances 




2 


_ 


0 


— 


0 


T 


0 


X 


0 


+ 


0 




0 


n 


0 


* 


1 


5 


0 



This table shows that the symbol " ~ " appears twice. A description is now given 
of the reason the number of appearances of the symbol is one. As described above, 
when a plurality of characteristic symbols are used twice, a method is employed where 
symbols further to the center are determined to be characteristic symbols. When this is 
implemented, in addition to counting all of the characteristic symbols within a range 
from the scanning range ps to pe, processing is also necessary to determine "which 
symbol (in this case, "*" and " A ") is further towards the center?". However, if scanning 
is carried out one character at a time from ps, the symbols at the center always become 
"first" and the number of appearances becomes "2". For the above reason, the number of 
appearances of the symbol "*" in table 5 becomes "1". 

When a plurality of eye symbols with a frequency of use of two appear or when 
the symbols for both eyes are different, as described above, in this embodiment, in the 
former case, the eye symbols that appear first two times, and in the latter case, the 
reading number of the eye symbol appearing finally on the left, are selected. However, a 
method may also be used where an order of priority is assigned to the eye symbols in 
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advance and the symbols are then selected using this order of priority. 

The reading selection unit 23 takes the reading number 28 and outline position 
data outputted from the character extraction unit 22 and the text data 24 as input, uses 
the reading table 29 (refer to table 4) to acquire reading character strings for the reading 
numbers, and outputs acquired reading character strings 30 facial character position data 
( start and end outline position in text data) to the text analyzer 11. 

As described above, according to the first embodiment, the following results are 
anticipated. 

(1) Readings can therefore be assigned to locations of facial expressions with a 
minimum of listings. This means that facial characters can be read out in a proficient 
manner without unnecessary listing of characters. Further, reading out can also be 
achieved for facial characters that may come about in the future. 

(2) The reading table and the characteristic symbol table are separated and table 
size can therefore be made small. 

Second Embodiment 

The overall configuration of the second embodiment is the same as for the first 
embodiment, with the exception that the internal configuration of the facial character 
reading assignment unit 12 is different. 

FIG. 7 is a structural view of a facial character reading assignment unit 12 of the 
second embodiment. 

The facial character reading assignment unit of this embodiment comprises a 
facial character determining unit 111 for receiving text data 119 and extracting outline 
position data 120 using an outline symbol table 114, a characteristic extraction unit 112 
for making frequency vectors using outline position data and a characteristic symbol 
table 115 and outputting an address of frequency vector and outline position data., a 
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reading selection unit 113 for comparing frequency vectors and typical vectors listed in 
the vector table 116, selecting typical vectors with a high degree of similarity, and 
outputting readings 121 corresponding to these typical vectors and facial character 
position data, a text data buffer 117 for storing the text data, and a frequency vector 
buffer 118 for storing the frequency vectors. 

As shown in FIG. 8, the characteristic extraction unit 112 comprises a frequency 
vector calculating unit 122 for scanning text data stored in the text buffer 117 over the 
range of the outline symbols, counting the frequency of occurrence of symbols listed in 
the characteristic symbol table 115 to obtain frequency vectors, and storing these 
frequency vectors in the frequency vector buffer 118, a characteristic symbol detection 
unit 124 for detecting whether or not characters currently being scanned are listed in the 
characteristic symbol table 115, and a normalization processor 123 for normalizing the 
frequency vectors. 

A description is now given of the tables used in each processing block. Three 
types of table are used in this embodiment, the outline symbol table 114, the 
characteristic symbol table 1 15 and the vector table 116. 

The outline symbol table 114 is the same as the outline symbol table shown in 
table 2, with right outline symbols and left outline symbols being listed, respectively. 

An example of the characteristic symbol table is shown in table 6. Symbols used 
in the facial character strings are listed in advance in the characteristic symbol table. In 
this characteristic symbol table, one record consists of a characteristic symbol and the 
number of the group to which the characteristic symbol belongs (with a plurality being 
possible). This means that there is the same number of records as there are symbols 
listed. 
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Table 6 



Symbol 


Group number 




1 


n 


1 




1 




1 


0 


2 


• 


2 


o 


2 




z 








3, 4 




3, 4 




4 


1 
1 




X 


4 




4,5 


# 


5 


T 


6 


5 


6 



A description is now given of the groups to which the characteristic symbols 
belong. A group is a collection of characteristic symbols used in such a manner as to 
have the same nuance. For example, the characteristic symbols of group number 1 show 
a group of symbols meaning "smile". Further, the symbol "j* " is often used as a facial 
character meaning "mistake" and "angry" and therefore belongs to a second group. 
Further, the groups of symbol tables used are decided by experimentation based on the 
shape. 
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FIG. 9 shows an outline view of a vector table. The vector table is composed of 
typical vectors made automatically in advance from a large amount of facial character 
data. Readings are then assigned to each listed vector according to the frequency 
distribution of the characteristic symbols of the recorded vectors. Numeral 151 and 
numeral 153 in FIG. 9 are typical vectors showing the nuances of certain facial 
characters. For example, a typical vector for 151 is a reading of (I give up) for the 
vector 152 which is a typical vector for the category meaning "mistake". For example, a 
typical vector for 153 is a reading of (smiling) for the vector 154 which is a typical 
vector for the category meaning "smile". 

The method of making the vector table is now described. The vector table has to 
be prestored and comprises a plurality of typical vectors, as described previously. These 
typical vectors are made and entered into a single table. A method for making typical 
vectors is now described. It is possible to easily make a typical vector using an existing 
algorithm. In this embodiment, an LBG algorithm is employed. In the following 
description, the steps from (C3) onwards correspond to the LBG algorithm. It is difficult 
for a degree of similarity to exist between vectors when frequency vectors are simply 
used without modification because the character string length of the facial characters is 
short. As a result, in (C2), an element whereby the number of appearances of all of the 
characteristic symbols belonging to the same group is added. 

(CI) A large amount of facial character data is collected together. 
(C2) Characters used in each item of facial character data are then converted to 
frequency vectors using the characteristic symbol table 115. Specifically, the following 
procedure is obeyed. 

(C2-1) If symbols listed in the characteristic signal table 115 exist within outline 
symbols, the number of times the symbols appear are set as frequency vectors. However, 
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when frequency vectors are set up for the number of appearances, it is assumed that not 
only a symbol but also all of the symbols within the group to which the symbol belongs 
appear when setting up the number of appearances. For example, when "Pi" appears 
within the inputted facial characters, according to the characteristic symbol table (refer 
to FIG. 6), the symbol belongs to group 1. Therefore, not only is the number of 
appearances of "Pi" increased, but also the number of appearances of all of the symbols 
other than " Pi " but belonging to the same group is increased. 
(C2-2) The frequency vector obtained in this manner is normalized. This is 
achieved by dividing the value for each element by the maximum element value for the 
vector, with the purpose of suppressing variation in the magnitude of the frequency 
vectors occurring due to the number of facial characters. 

(C3) The extracted frequency vector is inputted to an LBG algorithm and a typical 
vector is outputted. The following is a simple description of the flow when making a 
typical vector according to the LBG algorithm processing procedure. 

(C3-1) The required number of typical vectors and control parameters is set. 
(C3-2) An initial centroid CI is made from the inputted frequency vector. 
Specifically, the initial centroid CI is the mean value of all of the frequency vectors. 

(C3-3) The centroid is increased by a factor of two (centroid division processing). 
Specifically, the current centroid Ck (where k is taken to be an integer between 1 and 
the current centroid number n) makes two centroids Ck and Ck+n using a random 
vector r (where the number of dimensions of the vector is the same number as the 
centroid Ck) and a control parameter S (scalar quantity). For example, when the current 
centroid number is 2, new centroids CI and C3 are made based on the centroid CI, and 
new centroids C2 and C4 are then made based on the centroid C2. 

Centroids that have been doubled by (C3-4)(C3-3) are arranged in a classified 
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manner and in the most appropriate state (centroid updating process). Specifically, the 
inputted frequency vectors are subjected to vector quantization using the frequency 
vectors made using the current centroid (C2), and the centroid is repeatedly corrected 
until the quantization error Ei during this time is smaller than a preset threshold value E. 

The process is then complete when the current centroid number reaches the final 
typical vector number N set using (C3-5)(C3-1). If the current centroid number is less 
than N, the process (C3-3) is returned to. 

(C4) Readings are assigned to typical vectors made in the processing up to this 

point. 

Specifically, the following procedure is obeyed. 

All of the frequency vectors made in (C4-1)(C2) are classified using the typical 
vectors obtained in (C3). 

(C4-2) A reading for a characteristic vector that is most similar to the typical 
vector, from within the classified characteristic vectors, is taken as the reading for the 
typical vector assigned to this category, at the category assigned to this typical vector, 
for all typical vectors. 

The operation of the facial character determining unit is now described. Characters 
are scanned from the left end using the outline symbol table shown in table 2 and an 
outline position is extracted. However, an upper limit is set on the number of characters 
between the outline symbols and facial characters are therefore assumed to be character 
strings of a length that is the number of facial characters typically used, (the specific 
processing procedure is the same as for the first embodiment). 

An example of results of facial character determination processing is shown in 
FIG. 10. In FIG. 10, the position ps (163) of the left outline symbol and the position pe 
(164) of the right outline symbol are extracted. This text data is stored in the text buffer 
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117 and ps= left outline symbol address information and pe= right outline symbol 
address information are sent to the characteristic extraction unit 112. 

The operation of the characteristic extraction unit 112 will now be described. At 
the characteristic extraction unit frequency vectors are made according to the following 
procedure and sent to the reading selection unit 113. As described in the vector table 
making method, in order to resolve the problem regarding shortness of the character 
string length of the facial characters, in (Dl) in the following, an element is executed 
whereby the number of appearances of all of the characteristic symbols belonging to the 
same group are operated upon. 

(Dl) The frequency of symbols within outline symbol position data outputted 
from the outline character unit and within the inputted facial character data is calculated. 
Specifically, this is as follows. 

(Dl-1) The scanning pointer p is aligned with the left outline symbol position Ps 
extracted using the outline extraction unit. 

(Dl-2) The following steps are repeated until the scanning pointer p reaches the 
right outline symbol position Pe extracted by the outline extraction unit. 

(Dl-3) The characteristic symbol table is searched for the character pointed to by 
the scanning pointer p. 

If the results of the search are that the character is listed, the number of 
appearances of all of the characteristic symbols belonging to the same group as the 
characteristic symbol is increased by one. 

(Dl-4) The scanning pointer p is advanced to the right by one character, and 
(Dl-2) is returned to. 

An example of the frequency vectors made in the process (Dl) is shown in FIG. 
11, i.e. frequency vectors made from the character strings of FIG. 10 are shown. 



(D2) The frequency vectors made in the processes (Dl) are normalized. 

The reason for executing this normalization process is as described above. 
Specifically, each element is divided by the maximum frequency stored in the frequency 
vector buffer. The frequency vector made in (D2) is taken to have a maximum value of 
1 and to have the same shape as in FIG. 1 1 . 

(D3) The normalized frequency vector is stored in the frequency vector buffer 118 
and this start address and outline position data are sent to the reading selection unit 113. 

The operation of the reading selection unit 113 will now be described. At the 
reading selection unit, readings are acquired from frequency vectors made using the 
characteristic extraction unit in accordance with the following procedure. 

(El) A typical vector that is most similar to the inputted frequency vector is 
obtained in the following process. 

(El-1) A counter k is initialized to 1. 

(El -2) The following process is repeated until the counter k reaches the typical 
vector number M. 

(El -3) An error Ek for the kth typical vector listed in the vector table 116 and the 
frequency vector outputted from the characteristic extraction unit is calculated. The 
method of calculating the error Ek can be obtained in accordance with the following 
equation, 
n 

Ek = 2(Xi -Ck, i)2-.(i) 
i = l 

where Xi is the ith element of the inputted frequency vector and Ck, i is the ith 
element of the kth typical vector. 

(El -4) The counter k is set to k+1, and (El -2) is returned to. 
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(E2) A reading allotted to the typical vector selected in (El) is acquired, and this 
reading and facial character position data ( start and end outline position in text data ) 
are outputted. 

FIG. 12 shows a typical vector determined to be the most similar in FIG. 11. At 
this typical vector, values are entered at the location of a symbol group meaning "angry" 
and "mistake" and the symbol group meaning "smile", and the assigned reading is 
"Don't be silly!". 

As described above, according to the second embodiment, combinations of 
characteristic primitives for inputted facial character data are put into the form of 
vectors using the number of appearances of characters. Reference vectors for frequency 
vectors are prepared in advance based on a large amount of facial character data. A 
reading for a vector made from the inputted data and the most similar typical vector can 
then be outputted by comparing these items. This means that assignment of readings to 
facial characters is possible without registering facial character patterns. 
Third Embodiment 

The overall device configuration is the same as for the first and second 
embodiments, with the exception that the internal configuration of the facial character 
reading assignment unit 12 is different. 

Configurations for the facial characters and assignment unit of this embodiment 
are now described. These configurations are shown in FIG. 13. The facial character 
reading assignment unit of this embodiment comprises a facial character determining 
unit 191 for receiving text data 199 and extracting outline position data 200 using an 
outline symbol table 194, a characteristic extraction unit 192 for making frequency 
vectors by receiving outline position data and using a characteristic symbol table 195, a 
reading selection unit 193 for comparing frequency vectors and typical vectors listed in 
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the vector table 196, selecting typical vectors with a high degree of similarity, and 
outputting readings 201 corresponding to these selected typical vectors and facial 
character position data, a text data buffer 197 for storing the text data, and a frequency 
vector buffer 198 for storing the frequency vectors, 

FIG. 14 is a view showing the details of a configuration for the characteristic 
extraction unit 192. The characteristic extraction unit 192 comprises a frequency vector 
calculating unit 202 for scanning text data stored within the text buffer within the range 
of the outline symbols and storing the number of appearances of certain symbols in the 
characteristic symbol table in a frequency vector buffer, a characteristic symbol 
detection unit 205 for searching whether or not symbols stored in the text buffer are 
listed in the characteristic symbol table, a filter unit 203 for smoothing frequency 
vectors stored in the frequency vector buffer, and a normalization processor 204 for 
normalizing frequency vectors. 

The following processing is carried out at the filter unit 203 (in this embodiment, 

n=l). 

n n 
Yi'= 2 (n- | k | +1) Yi+k/ 2 (n- | k | +l)-(2) 
k=— n k = — n 

where Yi is the value of the ith element of a frequency vector before filtering and 
Yi' is a value of an ith element after filtering, and n is a variable indicating window size 
of the filter. 

A description is now given of the tables used in each processing block. Three 
types of table are used in this embodiment, the outline symbol table 194, the 
characteristic symbol table 195 and the vector table 196. 

The outline symbol table is the same as that shown in table 2, with right outline 
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symbols and left outline symbols being listed, respectively. 

An example of the characteristic symbol table 195 is shown in table 7. Symbols 
used in the facial character strings are listed in advance in the characteristic symbol 
table. This characteristic symbol table lines up characteristic symbols with similar 
symbol shapes, or characteristic symbols used with similar meanings near to each other. 
Further, registering of as many symbols that may be used as symbols in facial characters 
as possible is also preferable from the point of view of providing compatibility with 
facial characters that may continue to increase thereafter. This table is made through 
experimentation. 



Table 7 
Symbol 
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FIG. 15 shows an example of a vector table. The vector table is composed of a 
plurality of items listed in advance made from a large amount of facial character data. 
Readings are then assigned to each listed vector according to the frequency distribution 
of the characteristic symbols of the recorded vectors. 

The method of making the vector table is now described. This vector table consists 
of a plurality of typical vectors. These typical vectors can be made in a straightforward 
manner using existing algorithms. An LBG algorithm is employed in this embodiment. 
As described above, it is difficult for a degree of similarity to exist between vectors 
when frequency vectors are simply used without modification because the character 
string length of the facial characters is short. As with the method for making a vector 
table of the second embodiment, in (F2) an element is performed whereby the number 
of appearances of characteristic symbols included in neighboring element values is 
operated upon. 

(Fl) A large amount of facial character data is collected together. 

(F2) Characters used in each item of facial character data are then converted to 
frequency vectors using the characteristic symbol table 195. 

Normalization is carried out using the maximum frequency after processing the 
vector data at the smoothing filter 203 in order to compensate for an insufficient amount 
of information for the vector data due to the shortness of the number of characters for 
the facial characters. The smoothing filter updates vector values according to equation 
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(2). The number of appearances of the characteristic symbols for similar shapes lined up 
next to each other therefore increases due to this processing. 

(F3) The extracted frequency vector is inputted to an LBG algorithm and a typical 
vector is outputted. 

The following is a simple description of the flow when making a typical vector 
according to the LBG algorithm processing procedure. 

(F3-1) The required number of typical vectors and control parameters is set. 
(F3-2) An initial centroid CI is made from the inputted frequency vector 
Specifically, the initial centroid CI is the mean value of all of the frequency 
vectors. 

(F3-3) The centroid is increased by a factor of two (centroid division 
processing). 

Specifically, the current centroid Ck (where k is taken to be an integer between 1 
and the current centroid number n) makes two centroids Ck and Ck+n using a random 
vector r (where the number of dimensions of the vector is the same number as the 
centroid Ck) and a control parameter S (scalar quantity). For example, when the current 
centroid number is 2, new centroids CI and C3 are made based on the centroid CI, and 
new centroids C2 and C4 are then made based on the centroid CI. 

(F3-4) Centroids that have been doubled by processing (F3-3) are arranged in a 
classified manner and in the most appropriate state (centroid updating process). 

Specifically, the inputted frequency vectors are subjected to vector quantization 
using the current centroid, and the centroid is repeatedly corrected until the quantization 
error Ei during this time is smaller than a preset threshold value E. 

(F3-5) The process is then complete when the current centroid number reaches 
the final typical vector number N set using processing (F3-1). 
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If the current centroid number is less than N, then (F3-3) is returned to. 

(F4) Readings are assigned to typical vectors made in the processing up to the 
steps above. 

Specifically, the following procedure is obeyed. 

(F4-1) All of the frequency vectors made from the inputted facial character data 
are classified into typical vectors obtained in (F3). 

(F4-2) A reading for a characteristic vector that is most similar to the typical 
vector, from within the classified characteristic vectors, is taken as the reading for the 
typical vector assigned to this category, at the category assigned to this typical vector, 
for all typical vectors. 

The operation of the facial character determining unit is now described. Characters 
are scanned from the left end using the outline symbol table of table 2 and an outline 
position is extracted. However, an upper limit is set on the number of characters 
between the outline symbols and facial characters are therefore assumed to be character 
strings of a length that is the number of facial characters typically used. An example of 
results of facial character determination processing is shown in FIG. 16. In FIG. 16, it is 
determined whether or not the position ps (242) of the left outline symbol and the 
position pe (243) of the right outline symbol are extracted. Here, ps and pe are then sent 
to the characteristic extraction unit. 

The operation of the characteristic extraction unit will now be described. At the 
characteristic extraction unit, frequency vectors are made according to the following 
procedure and sent to the reading selection unit. 

(Gl) The frequency of symbols within outline symbol position data outputted 
from the outline extraction unit and within the inputted facial character data is 
calculated. Specifically, this is as follows. 
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(Gl-1) The scanning pointer p is aligned with the left outline symbol position p s 
extracted using the outline extraction unit. 

(Gl-2) The following steps are repeated until the scanning pointer p reaches the 
right outline symbol position pe extracted by the outline extraction unit. 

(Gl-3) The characteristic symbol table is searched for the character pointed to 
by the scanning pointer p. If the results of the search are that the character is listed, the 
number of appearances of all of the characteristic symbols is incremented by +1. 

(Gl-4) The scanning pointer p is advanced to the right by one character, and 
(Gl-2) is returned to. 

An example of a frequency vector made based on FIG. 16 is shown in FIG. 17. It 
is determined whether or not the symbol "D" appears two times and the symbol " 
appears once. 

(G2) Normalization is carried out on the frequency vectors made using the 
processing in (Gl) using a maximum appearance value after subjecting the frequency 
vectors to filtering. It is difficult for a degree of similarity to exist between vectors when 
frequency vectors are simply used without modification because the character string 
length of the facial characters is short. Symbols that are similar in shape are arranged in 
advance so as to be lined up close to each other. When an arbitrary symbol then appears, 
the similarity between vectors can also be increased by increasing the number of 
appearances of surrounding symbols using filtering. FIG. 18 shows the results of 
subjecting the vectors in FIG. 17 to smoothing processing and to normalization 
processing. It can therefore be understood that by adding smoothing processing, values 
appear not just for the symbol " D " but also for the symbols " " ^" that are also often 
used so as to have the same meaning. 

(G3) Normalized characteristic vectors and outline position data are sent to the 
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reading selection unit 193. 

The operation of the reading selection unit 193 will now be described. At the 
reading selection unit, readings are acquired from frequency vectors made using the 
characteristic extraction unit in accordance with the following procedure. 

(HI) A typical vector that is most similar to the inputted frequency vector is 
obtained in the following process. 

(Hl-1) A counter k is initialized to 1. 

(HI -2) The following process is repeated until the counter k reaches the typical 
vector M. 

(HI -3) An error Ek for the kth typical vector listed in the vector table 196 and 
the frequency vector outputted from the characteristic extraction unit is calculated in 
accordance with equation (1). 

(HI -4) The counter k is set to k+1, and (HI -2) is returned to. 
(H2) A reading allotted to the typical vector selected in (HI) is acquired, and this 
reading and facial character position data (start and end outline position in text data) are 
outputted. 

As described above, according to the third embodiment, combinations of 
characteristic primitives for inputted facial character data are put into the form of 
vectors using the number of appearances of characters. A table of reference vectors for 
frequency vectors is made in advance based on a large amount of facial character data. 
A reading for a vector made from the inputted data and the most similar typical vector 
can then be outputted by comparing these items. This means that assignment of readings 
to facial characters is possible by taking into consideration combinations of 
characteristic primitives without registering facial character patterns. 

Further, the processing of this embodiment only employs simple filtering. This 
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means that both processing speed and mounting efficiency can be improved. 
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